{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Predicting RNA secondary structure with LinearRNA"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "LinearRNA includes a series of linear-time algorithms/softwares for RNA secondary structure analysis: **LinearFold** and **LinearPartition**. "
   ]
  },
  {
   "source": [
    "# Install dependency"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "Install successfully\n"
     ]
    }
   ],
   "source": [
    " !pip install paddlehelix\n",
    " from IPython.display import clear_output\n",
    " clear_output()\n",
    " print(\"Install successfully\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part I: LinearFold"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**LinearFold** is the first linear-time prediction algorithm/software for RNA secondary structures. \n",
    "The LinearFold paper has been accepted by ISMB, a top-level conference on computational biology and published on Bioinformatics, an authoritative journal. The link of the paper is: [LinearFold: linear-time approximate RNA folding by 5'-to-3' dynamic programming and beam search](https://academic.oup.com/bioinformatics/article/35/14/i295/5529205)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Machine learning model\n",
    "```bash\n",
    "linear_fold_c(rna_sequence, beam_size = 100, use_constraints = False, constraint = \"\", no_sharp_turn = True)\n",
    "```\n",
    "Using the machine learning model proposed by [CONTRAfold](https://pubmed.ncbi.nlm.nih.gov/16873527/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Parameter setting\n",
    "- rna_sequence: the input RNA sequence to predict the secondary structure.\n",
    "- beam_size: int (default 100), set 0 to turn off the beam pruning.\n",
    "- use_constraints: bool (default False), enable adding constraints when predicting structures.\n",
    "- constraint: string (default \"\"), the constraint sequence. It works when the parameter use_constraints is Ture. The  constraint sequence should have the same length as the RNA sequence. \"? . ( )\" indicates a position for which the proper matching is unknown, unpaired, left or right parenthesis respectively. The parentheses must be well-banlanced and non-crossing.\n",
    "- no_sharp_turn: bool (default True), disable sharpturn in prediction.\n",
    "### Return Value\n",
    "- tuple(string, float): return a tuple including the predicted structures and the folding free energy.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "('..((((.(((....)))...))))....((((((............................))))))....',\n",
       " 0.4548597317188978)"
      ]
     },
     "metadata": {},
     "execution_count": 4
    }
   ],
   "source": [
    "import pahelix.toolkit.linear_rna as linear_rna\n",
    "input_sequence = \"AACUCCGCCAGGCCUGGAAGGGAGCAACGGUAGUGACACUCUCUGUGUGCGUAGGUUGCCUAGCUACCAUUU\"\n",
    "linear_rna.linear_fold_c(input_sequence)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "('..(.(((......)((........))(((......(.......).))).....))..)..............',\n",
       " -27.328358240425587)"
      ]
     },
     "metadata": {},
     "execution_count": 5
    }
   ],
   "source": [
    "# with constraints\n",
    "constraint = \"??(???(??????)?(????????)???(??????(???????)?)???????????)??.???????????\"\n",
    "linear_rna.linear_fold_c(input_sequence, use_constraints = True, constraint = constraint)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Thermodynamic model\n",
    "```bash\n",
    "linear_fold_v(rna_sequence, beam_size = 100, use_constraints = False, constraint = \"\", no_sharp_turn = True)\n",
    "```\n",
    "Using the thermodynamic model from [Vienna RNAfold](https://almob.biomedcentral.com/articles/10.1186/1748-7188-6-26).\n",
    "The parameters are the same as the machine learning-based model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "('..((((.(((....)))...))))....((((((.((((.....))))...((((...))))))))))....',\n",
       " -18.4)"
      ]
     },
     "metadata": {},
     "execution_count": 7
    }
   ],
   "source": [
    "linear_rna.linear_fold_v(input_sequence)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "('..(.(((......)((........))(((......(.......).))).....))..)..............',\n",
       " 13.4)"
      ]
     },
     "metadata": {},
     "execution_count": 8
    }
   ],
   "source": [
    "# with constriants\n",
    "linear_rna.linear_fold_v(input_sequence, use_constraints = True, constraint = constraint)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part II: LinearPartition"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**LienarPartition** is the first linear-time partition function and base pair probabilities calculation algorithm/software for RNA secondary structures. The LinearPartition paper has been accepted by ISMB, a top-level conference on computational biology and published on Bioinformatics, an authoritative journal. The link of the paper is: [LinearPartition: linear-time approximation of RNA folding partition function and base-pairing probabilities](https://academic.oup.com/bioinformatics/article/36/Supplement_1/i258/5870487)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Machine learning model\n",
    "```bash\n",
    "linear_fold_c(rna_sequence, beam_size = 100, use_constraints = False, constraint = \"\", no_sharp_turn = True)\n",
    "```\n",
    "Using the machine learning model from [CONTRAfold](https://pubmed.ncbi.nlm.nih.gov/16873527/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Parameter setting\n",
    "- rna_sequence: string, the input RNA sequence to calculate partition function and base pair probabities. \n",
    "- beam_size: int (default 100), set 0 to turn off the beam pruning.\n",
    "- bp_cutoff: double (default 0.0), only output base pairs with correponding proabilities whose values larger than the bp_cutoff (between 0 and 1).\n",
    "- no_sharp_turn: bool (default True), enable sharpturn in prediction.\n",
    "### Return\n",
    "- tuple(string, list): ruturn a tuple consisting the partition function value, and a list of base pair probabilities"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "(0.6399469375610352,\n",
       " [(4, 13, 0.2007068395614624),\n",
       "  (10, 22, 0.24661558866500854),\n",
       "  (11, 21, 0.2457289695739746),\n",
       "  (12, 20, 0.20926791429519653)])"
      ]
     },
     "metadata": {},
     "execution_count": 9
    }
   ],
   "source": [
    "input_sequence = \"UGAGUUCUCGAUCUCUAAAAUCG\"\n",
    "linear_rna.linear_partition_c(\"UGAGUUCUCGAUCUCUAAAAUCG\", bp_cutoff = 0.2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Thermodynamic model\n",
    "```bash\n",
    "linear_fold_v(rna_sequence, beam_size = 100, use_constraints = False, constraint = \"\", no_sharp_turn = True)\n",
    "```\n",
    "Using the thermodynamic model from [Vienna RNAfold](https://almob.biomedcentral.com/articles/10.1186/1748-7188-6-26).\n",
    "The parameters are the same as the machine learning model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "(-1.9573111534118652,\n",
       " [(2, 15, 0.833134651184082),\n",
       "  (3, 14, 0.8365526795387268),\n",
       "  (4, 13, 0.8355389833450317)])"
      ]
     },
     "metadata": {},
     "execution_count": 10
    }
   ],
   "source": [
    "linear_rna.linear_partition_v(input_sequence, bp_cutoff = 0.5)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "name": "python3",
   "display_name": "Python 3.7.9 64-bit",
   "metadata": {
    "interpreter": {
     "hash": "ffeddd7d1ba530e12d9eb06e10b8b56b9c925e18ed1aeb11a97f0d6e21c5108a"
    }
   }
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.9-final"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}